2 research outputs found
Text Similarity from Image Contents using Statistical and Semantic Analysis Techniques
Plagiarism detection is one of the most researched areas among the Natural
Language Processing(NLP) community. A good plagiarism detection covers all the
NLP methods including semantics, named entities, paraphrases etc. and produces
detailed plagiarism reports. Detection of Cross Lingual Plagiarism requires
deep knowledge of various advanced methods and algorithms to perform effective
text similarity checking. Nowadays the plagiarists are also advancing
themselves from hiding the identity from being catch in such offense. The
plagiarists are bypassed from being detected with techniques like paraphrasing,
synonym replacement, mismatching citations, translating one language to
another. Image Content Plagiarism Detection (ICPD) has gained importance,
utilizing advanced image content processing to identify instances of plagiarism
to ensure the integrity of image content. The issue of plagiarism extends
beyond textual content, as images such as figures, graphs, and tables also have
the potential to be plagiarized. However, image content plagiarism detection
remains an unaddressed challenge. Therefore, there is a critical need to
develop methods and systems for detecting plagiarism in image content. In this
paper, the system has been implemented to detect plagiarism form contents of
Images such as Figures, Graphs, Tables etc. Along with statistical algorithms
such as Jaccard and Cosine, introducing semantic algorithms such as LSA, BERT,
WordNet outperformed in detecting efficient and accurate plagiarism.Comment: NLPTT2023 publication, 10 Page
Marathi-English Code-mixed Text Generation
Code-mixing, the blending of linguistic elements from distinct languages to
form meaningful sentences, is common in multilingual settings, yielding hybrid
languages like Hinglish and Minglish. Marathi, India's third most spoken
language, often integrates English for precision and formality. Developing
code-mixed language systems, like Marathi-English (Minglish), faces resource
constraints. This research introduces a Marathi-English code-mixed text
generation algorithm, assessed with Code Mixing Index (CMI) and Degree of Code
Mixing (DCM) metrics. Across 2987 code-mixed questions, it achieved an average
CMI of 0.2 and an average DCM of 7.4, indicating effective and comprehensible
code-mixed sentences. These results offer potential for enhanced NLP tools,
bridging linguistic gaps in multilingual societies